Classification of web robots: An empirical study based on over one billion requests

نویسندگان

  • Junsup Lee
  • Sung Deok Cha
  • Dongkun Lee
  • Hyungkyu Lee
چکیده

Many studies on detection and classification of web robots have focused their attention mostly on text crawlers, and empirical experiments used relatively small data collected at universities. In this paper, we analyzed more than one billion requests to www.microsoft. com in 24 h. Web logs were made anonymous to eliminate potential privacy concerns while preserving essential characteristics (e.g., frequency, queries, etc). We have developed an effective characterization metrics, based on workload characteristics and resource types, in detecting and classifying various web robots including text crawlers, link checkers, and icon crawlers. As expected, web robot behavior was clearly different from that of typical interactive users, and different types of web robots also exhibited different characteristics. However, comparison of the similar type of web robots, text crawlers in particular, revealed different characteristics, thereby enabling characterization with reasonably high confidence level. We divided various feature metrics into five groups, and effectiveness of each group in classification is shown in polar diagram in the decreasing order of effectiveness in the clockwise direction. One can use the findings to classify likely identify of unknown web robots, and organizations can develop appropriate measures to deal with them. Our analysis is based on recent web log data collected at one of the best known site which offers truly global service. Crown Copyright a 2009 Published by Elsevier Ltd. All rights reserved.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A density based clustering approach to distinguish between web robot and human requests to a web server

Today world's dependence on the Internet and the emerging of Web 2.0 applications is significantly increasing the requirement of web robots crawling the sites to support services and technologies. Regardless of the advantages of robots, they may occupy the bandwidth and reduce the performance of web servers. Despite a variety of researches, there is no accurate method for classifying huge data ...

متن کامل

Web Robot Detection based on Monotonous Behavior

Several studies examined various features on how to most effectively detect web robots. Based on an insight that most web robots, regardless of specifics, would exhibit focused and therefore monotonous behavior, this paper proposes that monitoring the rate of behavioral change is highly effective in detecting sessions initiated by web robots. Empirical evaluation performed on more than one bill...

متن کامل

Identification and Classification of Desirable Web-Based Services from the Perspective of Website Users of Iran’s Hospitals Based on Kano Model of Customer Satisfaction

Background and Aim: A hospital website is an appropriate system for exchanging information and connecting patients, hospitals and medical staff. The purpose of this study was to identify and classify desirable web-based services in websites of Iran's hospitals based on Kano’s Customer Satisfaction Model. Materials and Methods: This was a survey study. The statistical population of the study co...

متن کامل

Anomaly-based Web Attack Detection: The Application of Deep Neural Network Seq2Seq With Attention Mechanism

Today, the use of the Internet and Internet sites has been an integrated part of the people’s lives, and most activities and important data are in the Internet websites. Thus, attempts to intrude into these websites have grown exponentially. Intrusion detection systems (IDS) of web attacks are an approach to protect users. But, these systems are suffering from such drawbacks as low accuracy in ...

متن کامل

Fuzzy Motion Control for Wheeled Mobile Robots in Real-Time

Due to various advantages of Wheeled Mobile Robots (WMRs), many researchers have focused to solve their challenges. The automatic motion control of such robots is an attractive problem and is one of the issues which should carefully be examined. In the current paper, the trajectory tracking problem of WMRs which are actuated by two independent electrical motors is deliberated. To this end, and ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Computers & Security

دوره 28  شماره 

صفحات  -

تاریخ انتشار 2009